11 research outputs found

    확률적인 3차원 자세 복원과 행동인식

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. 오성회.These days, computer vision technology becomes popular and plays an important role in intelligent systems, such as augment reality, video and image analysis, and to name a few. Although cost effective depth cameras, like a Microsoft Kinect, have recently developed, most computer vision algorithms assume that observations are obtained from RGB cameras, which make 2D observations. If, somehow, we can estimate 3D information from 2D observations, it might give better solutions for many computer vision problems. In this dissertation, we focus on estimating 3D information from 2D observations, which is well known as non-rigid structure from motion (NRSfM). More formally, NRSfM finds the three dimensional structure of an object by analyzing image streams with the assumption that an object lies in a low-dimensional space. However, a human body for long periods of time can have complex shape variations and it makes a challenging problem for NRSfM due to its increased degree of freedom. In order to handle complex shape variations, we propose a Procrustean normal distribution mixture model (PNDMM) by extending a recently proposed Procrustean normal distribution (PND), which captures the distribution of non-rigid variations of an object by excluding the effects of rigid motion. Unlike existing methods which use a single model to solve an NRSfM problem, the proposed PNDMM decomposes complex shape variations into a collection of simpler ones, thereby model learning can be more tractable and accurate. We perform experiments showing that the proposed method outperforms existing methods on highly complex and long human motion sequences. In addition, we extend the PNDMM to a single view 3D human pose estimation problem. While recovering a 3D structure of a human body from an image is important, it is a highly ambiguous problem due to the deformation of an articulated human body. Moreover, before estimating a 3D human pose from a 2D human pose, it is important to obtain an accurate 2D human pose. In order to address inaccuracy of 2D pose estimation on a single image and 3D human pose ambiguities, we estimate multiple 2D and 3D human pose candidates and select the best one which can be explained by a 2D human pose detector and a 3D shape model. We also introduce a model transformation which is incorporated into the 3D shape prior model, such that the proposed method can be applied to a novel test image. Experimental results show that the proposed method can provide good 3D reconstruction results when tested on a novel test image, despite inaccuracies of 2D part detections and 3D shape ambiguities. Finally, we handle an action recognition problem from a video clip. Current studies show that high-level features obtained from estimated 2D human poses enable action recognition performance beyond current state-of-the-art methods using low- and mid-level features based on appearance and motion, despite inaccuracy of human pose estimation. Based on these findings, we propose an action recognition method using estimated 3D human pose information since the proposed PNDMM is able to reconstruct 3D shapes from 2D shapes. Experimental results show that 3D pose based descriptors are better than 2D pose based descriptors for action recognition, regardless of classification methods. Considering the fact that we use simple 3D pose descriptors based on a 3D shape model which is learned from 2D shapes, results reported in this dissertation are promising and obtaining accurate 3D information from 2D observations is still an important research issue for reliable computer vision systems.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Research Issues 4 1.3 Organization of the Dissertation 6 Chapter 2 Preliminary 9 2.1 Generalized Procrustes Analysis (GPA) 11 2.2 EM-GPA Algorithm 12 2.2.1 Objective function 12 2.2.2 E-step 15 2.2.3 M-step 16 2.3 Implementation Considerations for EM-GPA 18 2.3.1 Preprocessing stage 18 2.3.2 Small update rate for the covariance matrix 20 2.4 Experiments 21 2.4.1 Shape alignment with the missing information 23 2.4.2 3D shape modeling 24 2.4.3 2D+3D active appearance models 28 2.5 Chapter Summary and Discussion 32 Chapter 3 Procrustean Normal Distribution Mixture Model 33 3.1 Non-Rigid Structure from Motion 35 3.2 Procrustean Normal Distribution (PND) 38 3.3 PND Mixture Model 41 3.4 Learning a PNDMM 43 3.4.1 E-step 44 3.4.2 M-step 46 3.5 Learning an Adaptive PNDMM 48 3.6 Experiments 50 3.6.1 Experimental setup 50 3.6.2 CMU Mocap database 53 3.6.3 UMPM dataset 69 3.6.4 Simple and short motions 74 3.6.5 Real sequence - qualitative representation 77 3.7 Chapter Summary 78 Chapter 4 Recovering a 3D Human Pose from a Novel Image 83 4.1 Single View 3D Human Pose Estimation 85 4.2 Candidate Generation 87 4.2.1 Initial pose generation 87 4.2.2 Part recombination 88 4.3 3D Shape Prior Model 89 4.3.1 Procrustean mixture model learning 89 4.3.2 Procrustean mixture model fitting 91 4.4 Model Transformation 92 4.4.1 Model normalization 92 4.4.2 Model adaptation 95 4.5 Result Selection 96 4.6 Experiments 98 4.6.1 Implementation details 98 4.6.2 Evaluation of the joint 2D and 3D pose estimation 99 4.6.3 Evaluation of the 2D pose estimation 104 4.6.4 Evaluation of the 3D pose estimation 106 4.7 Chapter Summary 108 Chapter 5 Application to Action Recognition 109 5.1 Appearance and Motion Based Descriptors 112 5.2 2D Pose Based Descriptors 113 5.3 Bag-of-Features with a Multiple Kernel Method 114 5.4 Classification - Kernel Group Sparse Representation 115 5.4.1 Group sparse representation for classification 116 5.4.2 Kernel group sparse (KGS) representation for classification 118 5.5 Experiment on sub-JHMDB Dataset 120 5.5.1 Experimental setup 120 5.5.2 3D pose based descriptor 122 5.5.3 Experimental results 123 5.6 Chapter Summary 129 Chapter 6 Conclusion and Future Work 131 Appendices 135 A Proof of Propositions in Chapter 2 137 A.1 Proof of Proposition 1 137 A.2 Proof of Proposition 3 138 A.3 Proof of Proposition 4 139 B Calculation of p(XijDii) in Chapter 3 141 B.1 Without the Dirac-delta term 141 B.2 With the Dirac-delta term 142 C Procrustean Mixture Model Learning and Fitting in Chapter 4 145 C.1 Procrustean Mixture Model Learning 145 C.2 Procrustean Mixture Model Fitting 147 Bibliography 153 초 록 167Docto

    Synthetic Source Universal Domain Adaptation through Contrastive Learning

    No full text
    Universal domain adaptation (UDA) is a crucial research topic for efficient deep learning model training using data from various imaging sensors. However, its development is affected by unlabeled target data. Moreover, the nonexistence of prior knowledge of the source and target domain makes it more challenging for UDA to train models. I hypothesize that the degradation of trained models in the target domain is caused by the lack of direct training loss to improve the discriminative power of the target domain data. As a result, the target data adapted to the source representations is biased toward the source domain. I found that the degradation was more pronounced when I used synthetic data for the source domain and real data for the target domain. In this paper, I propose a UDA method with target domain contrastive learning. The proposed method enables models to leverage synthetic data for the source domain and train the discriminativeness of target features in an unsupervised manner. In addition, the target domain feature extraction network is shared with the source domain classification task, preventing unnecessary computational growth. Extensive experimental results on VisDa-2017 and MNIST to SVHN demonstrated that the proposed method significantly outperforms the baseline by 2.7% and 5.1%, respectively

    Building a Compact Convolutional Neural Network for Embedded Intelligent Sensor Systems Using Group Sparsity and Knowledge Distillation

    No full text
    As artificial intelligence (AI)- or deep-learning-based technologies become more popular, the main research interest in the field is not only on their accuracy, but also their efficiency, e.g., the ability to give immediate results on the users’ inputs. To achieve this, there have been many attempts to embed deep learning technology on intelligent sensors. However, there are still many obstacles in embedding a deep network in sensors with limited resources. Most importantly, there is an apparent trade-off between the complexity of a network and its processing time, and finding a structure with a better trade-off curve is vital for successful applications in intelligent sensors. In this paper, we propose two strategies for designing a compact deep network that maintains the required level of performance even after minimizing the computations. The first strategy is to automatically determine the number of parameters of a network by utilizing group sparsity and knowledge distillation (KD) in the training process. By doing so, KD can compensate for the possible losses in accuracy caused by enforcing sparsity. Nevertheless, a problem in applying the first strategy is the unclarity in determining the balance between the accuracy improvement due to KD and the parameter reduction by sparse regularization. To handle this balancing problem, we propose a second strategy: a feedback control mechanism based on the proportional control theory. The feedback control logic determines the amount of emphasis to be put on network sparsity during training and is controlled based on the comparative accuracy losses of the teacher and student models in the training. A surprising fact here is that this control scheme not only determines an appropriate trade-off point, but also improves the trade-off curve itself. The results of experiments on CIFAR-10, CIFAR-100, and ImageNet32 × 32 datasets show that the proposed method is effective in building a compact network while preventing performance degradation due to sparsity regularization much better than other baselines

    Spatio-Temporal Representation of an Electoencephalogram for Emotion Recognition Using a Three-Dimensional Convolutional Neural Network

    No full text
    Emotion recognition plays an important role in the field of human–computer interaction (HCI). An electroencephalogram (EEG) is widely used to estimate human emotion owing to its convenience and mobility. Deep neural network (DNN) approaches using an EEG for emotion recognition have recently shown remarkable improvement in terms of their recognition accuracy. However, most studies in this field still require a separate process for extracting handcrafted features despite the ability of a DNN to extract meaningful features by itself. In this paper, we propose a novel method for recognizing an emotion based on the use of three-dimensional convolutional neural networks (3D CNNs), with an efficient representation of the spatio-temporal representations of EEG signals. First, we spatially reconstruct raw EEG signals represented as stacks of one-dimensional (1D) time series data to two-dimensional (2D) EEG frames according to the original electrode position. We then represent a 3D EEG stream by concatenating the 2D EEG frames to the time axis. These 3D reconstructions of the raw EEG signals can be efficiently combined with 3D CNNs, which have shown a remarkable feature representation from spatio-temporal data. Herein, we demonstrate the accuracy of the emotional classification of the proposed method through extensive experiments on the DEAP (a Dataset for Emotion Analysis using EEG, Physiological, and video signals) dataset. Experimental results show that the proposed method achieves a classification accuracy of 99.11%, 99.74%, and 99.73% in the binary classification of valence and arousal, and, in four-class classification, respectively. We investigate the spatio-temporal effectiveness of the proposed method by comparing it to several types of input methods with 2D/3D CNN. We then verify the best performing shape of both the kernel and input data experimentally. We verify that an efficient representation of an EEG and a network that fully takes advantage of the data characteristics can outperform methods that apply handcrafted features

    Occluded Pedestrian-Attribute Recognition for Video Sensors Using Group Sparsity

    No full text
    Pedestrians are often obstructed by other objects or people in real-world vision sensors. These obstacles make pedestrian-attribute recognition (PAR) difficult; hence, occlusion processing for visual sensing is a key issue in PAR. To address this problem, we first formulate the identification of non-occluded frames as temporal attention based on the sparsity of a crowded video. In other words, a model for PAR is guided to prevent paying attention to the occluded frame. However, we deduced that this approach cannot include a correlation between attributes when occlusion occurs. For example, “boots” and “shoe color” cannot be recognized simultaneously when the foot is invisible. To address the uncorrelated attention issue, we propose a novel temporal-attention module based on group sparsity. Group sparsity is applied across attention weights in correlated attributes. Accordingly, physically-adjacent pedestrian attributes are grouped, and the attention weights of a group are forced to focus on the same frames. Experimental results indicate that the proposed method achieved 1.18% and 6.21% higher F1-scores than the advanced baseline method on the occlusion samples in DukeMTMC-VideoReID and MARS video-based PAR datasets, respectively

    Shallow and reverse attention network for colon polyp segmentation

    No full text
    Abstract Polyp segmentation is challenging because the boundary between polyps and mucosa is ambiguous. Several models have considered the use of attention mechanisms to solve this problem. However, these models use only finite information obtained from a single type of attention. We propose a new dual-attention network based on shallow and reverse attention modules for colon polyps segmentation called SRaNet. The shallow attention mechanism removes background noise while emphasizing the locality by focusing on the foreground. In contrast, reverse attention helps distinguish the boundary between polyps and mucous membranes more clearly by focusing on the background. The two attention mechanisms are adaptively fused using a “Softmax Gate”. Combining the two types of attention enables the model to capture complementary foreground and boundary features. Therefore, the proposed model predicts the boundaries of polyps more accurately than other models. We present the results of extensive experiments on polyp benchmarks to show that the proposed method outperforms existing models on both seen and unseen data. Furthermore, the results show that the proposed dual attention module increases the explainability of the model

    Reconstruct as Far as You Can: Consensus of Non-Rigid Reconstruction from Feasible Regions

    No full text

    Bacillus anthracis Capsule Activates Caspase-1 and Induces Interleukin-1β Release from Differentiated THP-1 and Human Monocyte-Derived Dendritic Cells▿ †

    No full text
    The poly-γ-d-glutamic acid (PGA) capsule is one of the major virulence factors of Bacillus anthracis, which causes a highly lethal infection. The antiphagocytic PGA capsule disguises the bacilli from immune surveillance and allows unimpeded growth of bacilli in the host. Recently, efforts have been made to include PGA as a component of anthrax vaccine; however, the innate immune response of PGA itself has been poorly investigated. In this study, we characterized the innate immune response elicited by PGA in the human monocytic cell line THP-1, which was differentiated into macrophages with phorbol 12-myristate 13-acetate (PMA) and human monocyte-derived dendritic cells (hMoDCs). PGA capsules were isolated from the culture supernatant of either the pXO1-cured strain of B. anthracis H9401 or B. licheniformis ATCC 9945a. PGA treatment of differentiated THP-1 cells and hMoDCs led to the specific extracellular release of interleukin-1β (IL-1β) in a dose-dependent manner. Evaluation of IL-1β processing by Western blotting revealed that cleaved IL-1β increased in THP-1 cells and hMoDCs after PGA treatment. Enhanced processing of IL-1β directly correlated with increased activation of its upstream regulator, caspase-1, also known as IL-1β-converting enzyme (ICE). The extracellular release of IL-1β in response to PGA was ICE dependent, since the administration of an ICE inhibitor prior to PGA treatment blocked induction of IL-1β. These results demonstrate that B. anthracis PGA elicits IL-1β production through activation of ICE in PMA-differentiated THP-1 cells and hMoDCs, suggesting the potential for PGA as a therapeutic target for anthrax
    corecore